Search CORE

12 research outputs found

ReBNet: Residual Binarized Neural Network

Author: Ghasemzadeh Mohammad
Koushanfar Farinaz
Samragh Mohammad
Publication venue
Publication date: 27/03/2018
Field of study

This paper proposes ReBNet, an end-to-end framework for training reconfigurable binary neural networks on software and developing efficient accelerators for execution on FPGA. Binary neural networks offer an intriguing opportunity for deploying large-scale deep learning models on resource-constrained devices. Binarization reduces the memory footprint and replaces the power-hungry matrix-multiplication with light-weight XnorPopcount operations. However, binary networks suffer from a degraded accuracy compared to their fixed-point counterparts. We show that the state-of-the-art methods for optimizing binary networks accuracy, significantly increase the implementation cost and complexity. To compensate for the degraded accuracy while adhering to the simplicity of binary networks, we devise the first reconfigurable scheme that can adjust the classification accuracy based on the application. Our proposition improves the classification accuracy by representing features with multiple levels of residual binarization. Unlike previous methods, our approach does not exacerbate the area cost of the hardware accelerator. Instead, it provides a tradeoff between throughput and accuracy while the area overhead of multi-level binarization is negligible.Comment: To Appear In The 26th IEEE International Symposium on Field-Programmable Custom Computing Machine

arXiv.org e-Print Archive

Crossref

XONN: XNOR-based Oblivious Deep Neural Network Inference

Author: Chen Hao
Koushanfar Farinaz
Laine Kim
Lauter Kristin
Riazi M. Sadegh
Samragh Mohammad
Publication venue
Publication date: 13/09/2019
Field of study

Advancements in deep learning enable cloud servers to provide inference-as-a-service for clients. In this scenario, clients send their raw data to the server to run the deep learning model and send back the results. One standing challenge in this setting is to ensure the privacy of the clients' sensitive data. Oblivious inference is the task of running the neural network on the client's input without disclosing the input or the result to the server. This paper introduces XONN, a novel end-to-end framework based on Yao's Garbled Circuits (GC) protocol, that provides a paradigm shift in the conceptual and practical realization of oblivious inference. In XONN, the costly matrix-multiplication operations of the deep learning model are replaced with XNOR operations that are essentially free in GC. We further provide a novel algorithm that customizes the neural network such that the runtime of the GC protocol is minimized without sacrificing the inference accuracy. We design a user-friendly high-level API for XONN, allowing expression of the deep learning model architecture in an unprecedented level of abstraction. Extensive proof-of-concept evaluation on various neural network architectures demonstrates that XONN outperforms prior art such as Gazelle (USENIX Security'18) by up to 7x, MiniONN (ACM CCS'17) by 93x, and SecureML (IEEE S&P'17) by 37x. State-of-the-art frameworks require one round of interaction between the client and the server for each layer of the neural network, whereas, XONN requires a constant round of interactions for any number of layers in the model. XONN is first to perform oblivious inference on Fitnet architectures with up to 21 layers, suggesting a new level of scalability compared with state-of-the-art. Moreover, we evaluate XONN on four datasets to perform privacy-preserving medical diagnosis.Comment: To appear in USENIX Security 201

arXiv.org e-Print Archive

Cryptology ePrint Archive

Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder

Author: Bittar Alexandre
Dixon Paul
Naik Devang
Nishu Kumari
Samragh Mohammad
Publication venue
Publication date: 31/08/2023
Field of study

Using a vision-inspired keyword spotting framework, we propose an architecture with input-dependent dynamic depth capable of processing streaming audio. Specifically, we extend a conformer encoder with trainable binary gates that allow us to dynamically skip network modules according to the input audio. Our approach improves detection and localization accuracy on continuous speech using Librispeech top-1000 most frequent words while maintaining a small memory footprint. The inclusion of gates also reduces the average amount of processing without affecting the overall performance. These benefits are shown to be even more pronounced using the Google speech commands dataset placed over background noise where up to 97% of the processing is skipped on non-speech inputs, therefore making our method particularly interesting for an always-on keyword spotter

arXiv.org e-Print Archive

HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words

Author: Cho Minsik
Kundu Arnav
Naik Devang
Padmanabhan Priyanka
Razlighi Mohammad Samragh
Publication venue
Publication date: 26/10/2022
Field of study

Streaming keyword spotting is a widely used solution for activating voice assistants. Deep Neural Networks with Hidden Markov Model (DNN-HMM) based methods have proven to be efficient and widely adopted in this space, primarily because of the ability to detect and identify the start and end of the wake-up word at low compute cost. However, such hybrid systems suffer from loss metric mismatch when the DNN and HMM are trained independently. Sequence discriminative training cannot fully mitigate the loss-metric mismatch due to the inherent Markovian style of the operation. We propose an low footprint CNN model, called HEiMDaL, to detect and localize keywords in streaming conditions. We introduce an alignment-based classification loss to detect the occurrence of the keyword along with an offset loss to predict the start of the keyword. HEiMDaL shows 73% reduction in detection metrics along with equivalent localization accuracy and with the same memory footprint as existing DNN-HMM style models for a given wake-word

arXiv.org e-Print Archive

GeneCAI: Genetic Evolution for Acquiring Compact AI

Author: Ghasemzadeh Mohammad
Han Song
He Yihui
Howard Andrew G
Luo Jian-Hao
Samragh Mohammad
Yang Tien-Ju
Yu Jiahui
Zhou Shuchang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/04/2020
Field of study

In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy such compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyper-parameters that require per-layer customization to ensure high accuracy. Choosing such hyper-parameters is cumbersome as the pertinent search space grows exponentially with model layers. This paper introduces GeneCAI, a novel optimization method that automatically learns how to tune per-layer compression hyper-parameters. We devise a bijective translation scheme that encodes compressed DNNs to the genotype space. The optimality of each genotype is measured using a multi-objective score based on accuracy and number of floating point operations. We develop customized genetic operations to iteratively evolve the non-dominated solutions towards the optimal Pareto front, thus, capturing the optimal trade-off between model accuracy and complexity. GeneCAI optimization method is highly scalable and can achieve a near-linear performance boost on distributed multi-GPU platforms. Our extensive evaluations demonstrate that GeneCAI outperforms existing rule-based and reinforcement learning methods in DNN compression by finding models that lie on a better accuracy-complexity Pareto curve

arXiv.org e-Print Archive

Crossref

Recommended from our members

End-to-end Customization of Efficient, Private, and Robust Neural Networks

Author: Samragh Razlighi Mohammad
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

Advancements in machine learning (ML) algorithms, data acquisition platforms, and high-end computer architectures have fueled an unprecedented industrial automation. An ML algorithm captures the dynamics of a task by learning an abstract model from domain-specific data. Once the model is trained by the ML algorithm, it can perform the underlying task with relatively high accuracy. This thesis is specifically focused on Deep Neural Networks (DNNs), a modern class of ML models that have shown promising performance in various applications. Thanks to DNNs, the breadth of automation has been expanded to tasks that were formerly too complex to be performed by computers; nowadays DNNs establish the foundation of applications such as voice recognition, medical image analysis, face authentication, to name a few.Despite DNNs' benefits, their deployment in real-world applications may be circumscribed by several factors. First, DNNs are computationally complex and their efficient execution on resource-constrained edge devices is a critical challenge. Second, users of DNN-based applications are often required to expose their data to the service provider, which may violate their privacy. Third, DNN models may fail to function correctly in the presence of malicious attackers. Having the aforementioned challenges in mind, it is a paramount challenge to design DNN-based systems that are efficient to execute, ensure users' privacy, and are robust to malicious attacks.This dissertation provides holistic customization techniques that pave the way for efficient, private, and robust DNN inference. The key contributions of the thesis are as follows:Efficiency: Development of encoded DNNs, a new family of memory-efficient neural networks. The thesis author's contributions provide customization techniques that enable incorporation of nonlinear encoding to the computation flow of neural networks. An end-to-end framework is introduced to facilitate encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. Efficiency: Introducing the concept of lookup-table based execution of encoded neural networks. The proposed method replaces floating-point multiplications with look-up table search. A memory-based hardware architecture is then proposed to execute the lookup-based multiplications and accelerate encoded DNN inference. Privacy: Establishing customized solutions for oblivious inference, where a client holds a data sample and a server holds a DNN model. After running the oblivious inference protocol, the client receives the inference result without revealing her input to the server. This thesis proposes automated customization solutions to speed up the oblivious inference while maintaining a high inference accuracy. Robustness: Development of solutions for online detection of neural Trojan triggers, a class of malicious attacks that cause a DNN to perform faulty inferences. The thesis proposes a novel methodology that enhances robustness to Trojan attacks by leveraging dictionary learning and sparse approximation

eScholarship - University of California